CLAIMS 

What is claimed is: 

1 1. A method for distributing data items from a particular set of data into a plurality of 

2 buckets based on (distribution keys associated with said data items, the method 

3 comprising the steps of: 

^ randomly selecting data items from said particular set of data to produce a 

i sampled\set of data items; 

6 determining a plurality of ranges based on the distribution keys associated 

7 with the sampled set of data items; 

8 assigning said plurality of ranges to said plurality of buckets; and 

9 distributing each data item in said particular set of data to the bucket that has 

10 been assigned the range into which falls the distribution key of the data 

11 item. \ 

1 2. The method of Claim 1 wherein the step of randomly selecting data items from said 

2 particular set of data includes randomly selecting data items from each subset of a 

3 plurality of subsets of said particular set ofvdata. 

1 3. The method of Claim 2 wherein the step of randomly selecting data items from each 

2 subset of a plurality of subsets of said particulanset of data includes randomly 

3 selecting data items from each partition of a partitioned table. 
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1 4. The method of Claim 2 wherein the step of randomly selecting data items from each 

2 subset of a plurality of subsets of said particular set of data includes randomly 

3 selecting data items from subsets of data, stored in buffers in volatile memory, that 

4 represent results of one or mo^e previously performed operations. 

1 5. The method of Claim 1 further comprising the steps of: 

2 assigning the plurality of buckets^to a plurality of processes; and 

3 causing each process of said plurality of processes to perform, in parallel with the 

4 other processes of said plurality of processes, an operation on the data items 

5 contained in any buckets assigned to the process. 

1 6. The method of Claim 2 further comprmng the step of selecting a distinct random seed 

2 for each subset of the plurality of subsets of said particular set of data. 

1 7. The method of Claim 1 wherein: \ 

2 the particular set of data is durably stored on aplurality of durable storage units; and 

3 the step of randomly selecting data items from said particular set of data to produce a 

4 sampled set of data items includes randomly selecting durable storage units 

5 from said plurality of durable storage units and using the data items stored on 

6 said randomly selected durable storage units as the sampled set of data items. 

1 8. The method of Claim 1 wherein the step of randomly selecting data items includes 

2 selecting a specified percentage of data items in said particular set of data. 
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1 9. The method of Claim 7 wherein the step of randomly selecting data items includes 

2 selecting a specified percentage of the plurality of durable storage units that are 

3 storing said particular set ondata. 

1 10. The method of Claim 8 further comprising the step of receiving, from a user, data that 

2 specifies said percentage. \ 

1 11. The method of Claim 9 further comprising the step of receiving, from a user, data that 

2 specifies said percentage. 

1 12. The method of Claim 5 wherein said operation is specified in a database command, 

2 the method further comprising receiving witmsaid database command data that 

3 indicates how much of said particular set of data to randomly select to produce said 

4 sampled set of data items. \^ 

1 13. The method of Claim 1 wherein the step of determining a plurality of ranges based on 

2 the distribution keys associated with the sampled set of data items includes 

3 determining ranges that contain an approximately equahamount of distribution keys 

4 associated with said sampled set of data items. \ 

1 14. A computer-readable medium carrying instructions for distributing data items from a 

2 particular set of data into a plurality of buckets based on distribution keys associated 

3 with said data items, the instructions comprising instructions fonperforming the steps 

4 of: \ 
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5 randomly selecting data items from said particular set of data to produce a 

6 sampled set of data items; 

7 determining a pluraliw of ranges based on the distribution keys associated 

8 with the sampled set of data items; 

9 assigning said plurality of ranges to said plurality of buckets; and 

10 distributing each data item in said particular set of data to the bucket that has 

1 1 been assigned the rahge into which falls the distribution key of the data 

12 item. \ 

1 15. The computer-readable medium of Claim 14 wherein the step of randomly selecting 

2 data items from said particular set of datayincludes randomly selecting data items from 

3 each subset of a plurality of subsets of saidparticular set of data. 

1 16. The computer-readable medium of Claim 1 5 wherein the step of randomly selecting 

2 data items from each subset of a plurality of subsets of said particular set of data 

3 includes randomly selecting data items from eacA partition of a partitioned table. 

1 17. The computer-readable medium of Claim 1 5 wherein the step of randomly selecting 

2 data items from each subset of a plurality of subsets of said particular set of data 

3 includes randomly selecting data items from subsets oft data, stored in buffers in 

4 . volatile memory, that represent results of one or more previously performed 

5 operations. \ 

1 18. The computer-readable medium of Claim 14 further comprising instructions for 

2 performing the steps of: \ 
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3 assigmhg the plurality of buckets to a plurality of processes; and 

4 causing each process of said plurality of processes to perform, in parallel with the 

5 other processes of said plurality of processes, an operation on the data items 

6 contained m any buckets assigned to the process. 

1 19. The computer-readableNriedium of Claim 1 5 further comprising instructions for 

2 performing the step of selecting a distinct random seed for each subset of the plurality 

3 of subsets of said particular set of data. 

1 20. The computer-readable medium^'Claim 14 wherein: 

2 the particular set of data is durably stored on a plurality of durable storage units; and 

3 the step of randomly selecting data itemsvfrom said particular set of data to produce a 

4 sampled set of data items includes randomly selecting durable storage units 

5 from said plurality of durable storage units and using the data items stored on 

6 said randomly selected durable storage units as the sampled set of data items. 

1 21. The computer-readable medium of Claim 1 4 wher^We step of randomly selecting 

2 data items includes selecting a specified percentage of data items in said particular set 

3 of data. \ 

1 22. The computer-readable medium of Claim 20 wherein the step ofj-andomly selecting 

2 data items includes selecting a specified percentage of the plurality^ durable storage 

3 units that are storing said particular set of data. \ 
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1 23. The computer-readable medium of Claim 21 further comprising instructions for 

2 performing tnkstep of receiving, from a user, data that specifies said percentage. 

1 24. The computer-readaMe medium of Claim 22 further comprising instructions for 
^ performing the step of receiving, from a user, data that specifies said percentage. 

1 25. The computer-readable medium of Claim 18 wherein said operation is specified in a 

2 database command, the computer-readable medium further comprising instructions 

3 for receiving with said database command data that indicates how much of said 

4 particular set of data to randomly select to produce said sampled set of data items. 

1 26. The computer-readable medium of Claim 14Swherein the step of determining a 

2 plurality of ranges based on the distribution keys associated with the sampled set of 

3 data items includes determining ranges that contaik an approximately equal amount of 

4 distribution keys associated with said sampled set of data items. 
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